18 research outputs found

    HSTREAM: A directive-based language extension for heterogeneous stream computing

    Full text link
    Big data streaming applications require utilization of heterogeneous parallel computing systems, which may comprise multiple multi-core CPUs and many-core accelerating devices such as NVIDIA GPUs and Intel Xeon Phis. Programming such systems require advanced knowledge of several hardware architectures and device-specific programming models, including OpenMP and CUDA. In this paper, we present HSTREAM, a compiler directive-based language extension to support programming stream computing applications for heterogeneous parallel computing systems. HSTREAM source-to-source compiler aims to increase the programming productivity by enabling programmers to annotate the parallel regions for heterogeneous execution and generate target specific code. The HSTREAM runtime automatically distributes the workload across CPUs and accelerating devices. We demonstrate the usefulness of HSTREAM language extension with various applications from the STREAM benchmark. Experimental evaluation results show that HSTREAM can keep the same programming simplicity as OpenMP, and the generated code can deliver performance beyond what CPUs-only and GPUs-only executions can deliver.Comment: Preprint, 21st IEEE International Conference on Computational Science and Engineering (CSE 2018

    Analyzing large-scale DNA Sequences on Multi-core Architectures

    Full text link
    Rapid analysis of DNA sequences is important in preventing the evolution of different viruses and bacteria during an early phase, early diagnosis of genetic predispositions to certain diseases (cancer, cardiovascular diseases), and in DNA forensics. However, real-world DNA sequences may comprise several Gigabytes and the process of DNA analysis demands adequate computational resources to be completed within a reasonable time. In this paper we present a scalable approach for parallel DNA analysis that is based on Finite Automata, and which is suitable for analyzing very large DNA segments. We evaluate our approach for real-world DNA segments of mouse (2.7GB), cat (2.4GB), dog (2.4GB), chicken (1GB), human (3.2GB) and turkey (0.2GB). Experimental results on a dual-socket shared-memory system with 24 physical cores show speed-ups of up to 17.6x. Our approach is up to 3x faster than a pattern-based parallel approach that uses the RE2 library.Comment: The 18th IEEE International Conference on Computational Science and Engineering (CSE 2015), Porto, Portugal, 20 - 23 October 201

    PaREM: A Novel Approach for Parallel Regular Expression Matching

    Full text link
    Regular expression matching is essential for many applications, such as finding patterns in text, exploring substrings in large DNA sequences, or lexical analysis. However, sequential regular expression matching may be time-prohibitive for large problem sizes. In this paper, we describe a novel algorithm for parallel regular expression matching via deterministic finite automata. Furthermore, we present our tool PaREM that accepts regular expressions and finite automata as input and automatically generates the corresponding code for our algorithm that is amenable for parallel execution on shared-memory systems. We evaluate our parallel algorithm empirically by comparing it with a commonly used algorithm for sequential regular expression matching. Experiments on a dual-socket shared-memory system with 24 physical cores show speed-ups of up to 21x for 48 threads.Comment: CSE-2014, Dec. 19th - 21st, 2014, Chengdu, Sichuan, Chin

    Using Cognitive Computing for Learning Parallel Programming: An IBM Watson Solution

    Full text link
    While modern parallel computing systems provide high performance resources, utilizing them to the highest extent requires advanced programming expertise. Programming for parallel computing systems is much more difficult than programming for sequential systems. OpenMP is an extension of C++ programming language that enables to express parallelism using compiler directives. While OpenMP alleviates parallel programming by reducing the lines of code that the programmer needs to write, deciding how and when to use these compiler directives is up to the programmer. Novice programmers may make mistakes that may lead to performance degradation or unexpected program behavior. Cognitive computing has shown impressive results in various domains, such as health or marketing. In this paper, we describe the use of IBM Watson cognitive system for education of novice parallel programmers. Using the dialogue service of the IBM Watson we have developed a solution that assists the programmer in avoiding common OpenMP mistakes. To evaluate our approach we have conducted a survey with a number of novice parallel programmers at the Linnaeus University, and obtained encouraging results with respect to usefulness of our approach

    Programming and Optimization of Big-Data Applications on Heterogeneous Computing Systems

    No full text
    The next-generation sequencing instruments enable biological researchers to generate voluminous amounts of data. In the near future, it is projected that genomics will be the largest source of big-data. A major challenge of big data is the efficient analysis of very large data-sets. Modern heterogeneous parallel computing systems, which comprise multiple CPUs, GPUs, and Intel Xeon Phis, can cope with the requirements of big-data analysis applications. However, utilizing these resources to their highest possible extent demands advanced knowledge of various hardware architectures and programming frameworks. Furthermore, optimized software execution on such systems demands consideration of many compile-time and run-time system parameters. In this thesis, we study and develop parallel pattern matching algorithms for heterogeneous computing systems. We apply our pattern matching algorithm for DNA sequence analysis. Experimental evaluation results show that our parallel algorithm can achieve more than 50x speedup when executed on host CPUs and more than 30x when executed on Intel Xeon Phi compared to the sequential version executed on the CPU. Thereafter, we combine machine learning and search-based meta-heuristics to determine near-optimal parameter configurations of parallel matching algorithms for efficient execution on heterogeneous computing systems. We use our approach to distribute the workload of the DNA sequence analysis application across the available host CPUs and accelerating devices and to determine the system configuration parameters of a heterogeneous system that comprise Intel Xeon CPUs and Xeon Phi accelerator. Experimental results show that the execution that uses the resources of both host CPUs and accelerating device outperforms the host-only and the device-only executions. Furthermore, we propose programming abstractions, a source-to-source compiler, and a run-time system for heterogeneous stream computing. Given a source code annotated with compiler directives, the source-to-source compiler can generate device-specific code. The run-time system can automatically distribute the workload across the available host CPUs and accelerating devices. Experimental results show that our solution significantly reduces the programming effort and the generated code delivers better performance than the CPUs-only or GPUs-only executions

    Automatic Java Code Generator for Regular Expressions and Finite Automata

    No full text
    corecore